Learning Zone

Courses@HKUST

  • EESM5060 Embedded Systems
  • ELEC6910A First Principles of CV

Books

  • Digital Integrated Circuits, A Design Perspective. Second Edition
  • CMOS VLSI Design, A Circuits and Systems Perspective. Fourth Edition
  • Verilog Digital System Design. Second Edition
  • Computer Architecture, A Quantitive Approach. Sixth Edition
  • 动手学深度学习 Release 2.0.0-beta1
  • 神经网络加速器的计算架构及存储优化技术研究 (Thanks to Prof. Fengbin TU who is the author and gifts me this book.)

Papers

  • AutoDCIM: An Automated Digital CIM Compiler
  • Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks
  • DaDianNao: A Machine-Learning Supercomputer
  • Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
  • Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
  • PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory
  • Reconfigurability, Why It Matters in AI Tasks Processing: A Survey of Reconfigurable AI Chips
  • Stripes: Bit-Serial Deep Neural Network Computing
  • A 4nm 6163-TOPS/W/b 4790-TOPS/mm2/b SRAM Based Digital-Computing-in-Memory Macro Supporting Bit-Width Flexibility and Simultaneous MAC and Weight Update
  • An 89TOPS/W and 16.3TOPS/mm2 All-Digital SRAM-Based Full-Precision Compute-In Memory Macro in 22nm for Machine-Learning Edge Applications
  • A 5-nm 254-TOPS/W 221-TOPS/mm2 Fully-Digital Computingin-Memory Macro Supporting Wide-Range Dynamic-Voltage-Frequency Scaling and Simultaneous MAC and Write Operations
  • A 12nm 121-TOPS/W 41.6-TOPS/mm2 All Digital Full Precision SRAM-based Compute-in-Memory with Configurable Bit-width For AI Edge Applications
  • An Ultra-Low-Voltage Bit-Interleaved Synthesizable 13T SRAM Circuit
  • All-Digital Time-Domain Compute-in-Memory Engine for Binary Neural Networks With 1.05 POPS/W Energy Efficiency
  • Multi-Function CIM Array for Genome Alignment Applications built with Fully Digital Flow
  • Compiling All-Digital-Embedded Content Addressable Memories on Chip for Edge Application
  • AI SoC Design in Foundation Model era
  • Benchmark and Modelling for SRAM based CIM
  • A Survey of Accelerator Architecture for DNNs
  • DIMC: 2219TOPS/W 2569F2/b Digital In-Memory Computing Macro in 28nm Based on Approximate Arithmetic Hardware
  • A 28nm 38-to-102-TOPS/W 8b Multiply-Less Approximate Digital SRAM Compute-In-Memroy Macro for Neural-Network Inference
  • Approximate De-randomizer for Stochastic Circuits
  • Algorithm-Software-Hardware Co-Design for Deep Learning Acceleration
  • Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks
  • Timeloop: A Systematic Approach to DNN Accelerator Evaluation
  • DynaPlasia: An eDRAM In-Memory-Computing-Based Reconfigurable Spatial Accelerator with Triple-Mode Cell for Dynamic Resource Switching
  • A 28nm 11.2TOPSW Hardware-Utilization-Aware Neural-Network Accelerator with Dynamic Dataflow
  • Understanding Reuse, Performance, and Hardware Cost of DNN Dataflows: A Data-Centric Approach (MAESTRO)
  • Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
  • MNSIM 2.0: A Behavior-Level Modeling Tool for Processing-In-Memory Architectures
  • Towards Heterogeneous Multi-core Accelerators Exploiting Fine-grained Scheduling of Layer-Fused Deep Neural Networks
  • DIANA: An End-to-End Energy-Efficient DIgital and ANAlog Hybrid Neural Network SoC.
  • MARS: Multimacro Architecture SRAM CIM-Based Accelerator With Co-Designed Compressed Neural Networks
  • Scalable and Programmable Neural Network Inference Accelerator Based on In-Memory Computing
  • Fused-Layer CNN Accelerators
  • Automatic Generation of Structured Macros Using Standard Cells ‒ Application to CIM